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Abstract 

Background: An important objective of DNA microarray-based gene expression experimentation is determining inter- 
relationships that exist between differentially expressed genes and biological processes, molecular functions, cellular 
components, signaling pathways, physiologic processes and diseases. 

Results: Here we describe GeneMesh, a web-based program that facilitates analysis of DNA microarray gene 
expression data. GeneMesh relates genes in a query set to categories available in the Medical Subject Headings (MeSH) 
hierarchical index. The interface enables hypothesis driven relational analysis to a specific MeSH subcategory (e.g., 
Cardiovascular System, Genetic Processes, Immune System Diseases etc.) or unbiased relational analysis to broader 
MeSH categories (e.g., Anatomy, Biological Sciences, Disease etc.). Genes found associated with a given MeSH category 
are dynamically linked to facilitate tabular and graphical depiction of Entrez Gene information, Gene Ontology 
information, KEGG metabolic pathway diagrams and intermolecular interaction information. Expression intensity 
values of groups of genes that cluster in relation to a given MeSH category, gene ontology or pathway can be 
displayed as heat maps of Z score-normalized values. GeneMesh operates on gene expression data derived from a 
number of commercial microarray platforms including Affymetrix, Agilent and lllumina. 

Conclusions: GeneMesh is a versatile web-based tool for testing and developing new hypotheses through relating 
genes in a query set (e.g., differentially expressed genes from a DNA microarray experiment) to descriptors making up 
the hierarchical structure of the National Library of Medicine controlled vocabulary thesaurus, MeSH. The system 
further enhances the discovery process by providing links between sets of genes associated with a given MeSH 
category to a rich set of html linked tabular and graphic information including Entrez Gene summaries, gene 
ontologies, intermolecular interactions, overlays of genes onto KEGG pathway diagrams and heatmaps of expression 
intensity values. GeneMesh is freely available online at http://proteogenomics.musc.edu/genemeshA 



Background 

DNA microarray analysis typically involves determining 
inter-relationships that exist between differentially 
expressed genes and biological processes or categories. 
Such information is available in the form of gene ontolo- 
gies, which describe gene products in terms of their asso- 
ciated biological processes, cellular components and 
molecular function (i.e., Gene Ontology [1]). Although 
employed by numerous computational algorithms, these 
databases lack the full breadth of information contained 
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in the literature. To address this problem, approaches are 
emerging that utilize the body of information contained 
in the Medical Subject Headings (MeSH) [2] of the U.S. 
National Library of Medicine (NLM) for interpreting 
DNA microarray data [3,4]. MeSH is a hierarchically 
structured compilation of nearly 25,000 descriptors that 
include broad and specific headings/categories. MeSH 
categories are populated by articles from 4,800 of the 
world's leading journals indexed in the MEDLINE/ 
PubMed database. The National Center for Biotechnol- 
ogy Information (NCBI) offers a MeSH database that 
provides links to all PubMed citations that correspond to 
a given MeSH term. Furthermore, NCBI makes available 
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the Gene Links display feature that provides access to the 
genes (GenelDs) mentioned in all PubMed articles. 

Our motivation for this project was to develop a web- 
based system that would facilitate analysis of DNA 
microarray gene expression data so as to detect relation- 
ships between groups of differentially expressed genes 
and the categories available in the MeSH hierarchical 
index. Importantly, the system needed to analyze the 
entirety of genes in an expression dataset, not simply 
query the MeSH hierarchical index using single genes. 
Ideally, the system would facilitate hypothesis testing by 
determining if an experimental stimulus elicits specific 
effects on genes belonging to a particular MeSH category. 
Furthermore, the system would display heatmaps of 
expression intensity values in order to show the way in 
which multiple genes, associated with a particular MeSH 
category, behaved in response to the experimental stimu- 
lus. Finally, the system would execute rapid linking of 
genes to their meta information, including the informa- 
tion contained in Entrez Gene, Gene Ontology, KEGG 
pathway and intermolecular interaction databases. Here 
we report on the development of a system that meets 
these specifications. 

Implementation 

Program Design 

The GeneMesh user interface is written in both HTML 
and a Web 2.0 technology, jQuery [5]. The backend is 
written in Perl [6] and PHP [7] and uses a MySQL Data- 
base (version 5.0.45) [8] operating on an Apache [9] web 
server running a Linux Fedora 7 [10] operating system. 
The GeneMesh Database is populated through a multi- 
step process. First, a search algorithm performs iterative 
web-based queries to collect all PubMed IDs linked to 
individual MeSH terms contained in the NLM MeSH 
Trees file. Only those MeSH terms having three or more 
parent nodes are employed in this process. Next, Gene 
IDs associated with the collected PubMed IDs are 
obtained from NCBI (file gene2pubmed.gz), and then the 
Gene ID, PubMed ID and MeSH associations are stored 
in a MySQL (Sun Microsystems) database (i.e., Gen- 
eMesh Database). For each MeSH category, the PubMed 
IDs and Gene IDs are tabulated and used to calculate the 
average number of citations per unique Gene ID. Addi- 
tional information stored in the database includes gene 
ontology information for each Gene ID (obtained from 
the NCBI file, gene2go.gz), HomoloGene IDs (obtained 
from the NCBI file, homologene.data) and GeneMesh 
Database statistics (e.g., total number of unique MeSH 
terms associated with GenelDs, number of top-level 
MeSH categories populated with GenelDs and the total 
number of HomoloGene IDs represented). This informa- 
tion can be viewed using the html link, DB Stats. 



Input 

GeneMesh is configured to analyze DNA microarray 
gene expression data derived from a number of commer- 
cial microarray platforms, including Affymetrix, Agilent 
and Illumina. Microarray data is input as a Comma Sepa- 
rated Value (CSV) file containing the probe identifiers 
and log base 2 intensity values. The program operates on 
subsets of expression data (e.g., a set of differentially 
expressed genes having < 500 genes), but many features 
also operate on an entire microarray data set. 

Major input features of the program enable users to 
identify genes in a data set that associate with a specified 
MeSH category (e.g., major MeSH category: Disease; sub 
category: Congenital, Hereditary, and Neonatal Diseases 
and Abnormalities) or word/phrase (e.g., angiogenesis, 
DiGeorge syndrome, extracellular matrix or stem cells). 
Users may employ a filter to govern the stringency of this 
identification process. For example, the analysis may be 
confined to GenelD/MeSH associations for which Gene- 
IDs have citation counts that exceed the average for all 
GenelDs in a given MeSH category; this is done by select- 
ing '>Avg' from the pull down menu. Similarly, the analy- 
sis may be made more stringent by limiting the analysis to 
GenelD/MeSH associations for which GenelDs have cita- 
tion counts greater than one standard deviation above the 
mean; this is done by selecting the '>StDev' pull down. 
The user may also employ an unfiltered analysis in which 
all GenelD/MeSH associations are included; this is done 
by selecting 'Unfiltered' from the pull down menu. 

Results 

GeneMesh displays a table of all MeSH subcategories 
detected from the input DNA microarray data file (Figure 
1). GenelDs in each MeSH category are linked to infor- 
mation at NCBI Entrez Gene and a hypergeometric prob- 
ability value [11] for the MeSH category is provided. The 
magnitude of the hypergeometric p value can be used to 
sort output MeSH categories in ascending or descending 
order. In addition to the automated parsing of input genes 
in relation to MeSH categories, the program will also 
associate genes from the input gene set with a specific 
user-entered search term, which may or may not be a 
MeSH term. 

Through a series of action links, the program depicts 
genes associated with a given MeSH category in relation 
to Gene Ontology [1] information (i.e., biological pro- 
cesses, molecular functions, cellular components and 
pathways derived from the gene2go.gz file at NCBI 
Entrez Gene). Using the amCharts software bundle 
(amCharts.com) ontology information is displayed as ani- 
mated pie charts with clickable slices that are linked to 
NCBI Entrez Gene information. Through another action 
link (designated K), GenelDs associated with a given 
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Figure 1 Screenshot of the GeneMesh user interface Red tabs 
along the right margin highlight input and output action links includ- 
ing: 1, pulldown for selecting limits on inclusion of genes into MeSH 
categories based on citation statistics; 2, the total number of GenelDs 
associated with the given MeSH term; 3, the total number of Homolo- 
Gene IDs associated with the given MeSH term; 4, the number of genes 
from the input set that match genes associated with the indicated 
MeSH term (clicking on the number links to Entrez Gene summaries for 
each of the genes); 5, the hypergeometric probability of genes associ- 
ated with a given MeSH category (clicking on the arrow sorts output 
MeSH categories in ascending or descending order based on the mag- 
nitude of the hypergeometric p value); 6, the B, M or C action links pro- 
vide gene ontology information; 7, the K action link links to KEGG 
pathway diagrams for genes associated with a given MeSH category; 
8, links to protein interaction information displayed in a dynamically in- 
teractive network (N) or links to protein interaction information dis- 
played in a tabular format (T); and 9, the H action link display a Heat 
map of Z score-normalized hybridization intensity values for the set of 
genes associated with a given MeSH category. 



MeSH category are linked to Kyoto Encyclopedia of 
Genes and Genomes (KEGG) metabolic pathways and 
their diagrams [12]. An action link (designated VIV for 
view intensity values) permits viewing expression inten- 
sity values for a given gene name, gene symbol or probe 
ID in the input data set. This feature also allows heat 
maps to be generated for the selected genes. 

The program will define intermolecular interactions 
known to exist between the protein products of genes 
associated with a given MeSH category or subcategory. 
Protein interaction information is displayed either in a 
tabular format or in a dynamically interactive network 
generated using the GUESS program http:// 



graphexploration.cond.org/ . The identified interactions 
are based on data contained in the file interactions. gz 
from NCBI ftp://ftp.ncbi.nlm.nih.gov/gene/GeneRIF . 
Interaction data contained in this file is derived from 
multiple sources including BIND, EcoCyc, BioGRID and 
HPRD (see ftp://ftp.ncbi.nlm.nih.gov/gene/GeneRIF/ 
interaction sources and GeneRIF . 

GeneMesh produces heat maps of Z score-normalized 
hybridization intensity values of genes in a query set that 
are associated with a given MeSH category. Heat maps 
are generated using the heatmap.2 function in the R 
gplots package and probe pair annotations available 
through Bioconductor [13]. Dotplots are generated using 
the dotchart function in the R graphics package. 

In addition to relating input genes to MeSH categories, 
the GeneMesh program also performs a number of useful 
upfront operations on input expression data including 
viewing of all genes in the data set with links to Entrez 
Gene summaries (using the 'V action link), viewing of 
expression intensity values of genes in the data set (using 
the 'VIV action link), grouping of genes in relation to 
gene ontologies (including biological processes, molecu- 
lar functions and cellular components using the 'B', 'M' 
and 'C action links, respectively), mapping of GenelDs of 
individual genes or clusters of genes to KEGG pathway 
diagrams (using the 'K action link) and generating heat- 
maps of expression intensities (using the 'H action link). 

GeneMesh has other useful query and browsing fea- 
tures that do not require input expression data. For exam- 
ple, if a user is interested in identifying genes associated 
with a particular process, the "Group genes to Mesh terms 
associated with a word/phrase" feature allows a user to 
input a term (e.g., asthma) and obtain a list of Mesh cate- 
gories and their associated GenelDs. 

A diagram of the flow of data into the GeneMesh pro- 
gram together with operation options and outputs is 
depicted in Figure 2. An example of a GeneMesh analysis 
of a group of differentially expressed genes is given in an 
Additional Material file (Additional File 1). The Gen- 
eMesh web site also provides access to tutorials, which 
are offered in both html and video formats. 

Conclusion 

In this paper we describe a new DNA microarray data 
analysis tool, GeneMesh, which can accelerate discovery 
by identifying significant inter-relationships that exist 
between differentially expressed genes and MeSH catego- 
ries. The fact that the user can specify that an analysis be 
limited to a specific subcategory (e.g., Cardiovascular dis- 
eases) or a broad category (e.g., Diseases) allows for 
biased and unbiased analyses. Furthermore, through the 
use of a robust set of action link features the user can 
quickly see how genes associated with a MeSH category 
relate to biological processes, molecular functions, cellu- 
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lar components and signaling pathways. Through access 
to heatmaps the user can readily see how the genes in any 
given MeSH categories or any related ontological cate- 
gory behave in response to the experimental stimuli used 
to generate the original differentially expressed gene data 
set. 

A limited number of online tools can analyze gene(s) 
based on MeSH associations. These tools, such as Gen- 
doo [14], MeSHer [15] and HuGE Navigator [16], have 
varying capabilities for data input, results browsing and 
significance analysis (Table 1). GeneMesh is distinct in its 
user interface (i.e., Web 2.0) and analysis features, com- 
bining the abilities to use gene identifiers as input data (as 
opposed to microarray platform-specific identifiers), per- 
mit upload of expression data for the gene set, and exe- 
cute a significance analysis that can be adjusted for 
stringency. Furthermore, GeneMesh offers many unique 
options for browsing the analysis results in graphical or 
tabular format, either as a whole or partitioned according 
to other functional features. 

Availability and requirements 

Project name: GeneMesh 

Project homepage: http://proteogenomics.musc.edu/ 
genemesh/ 

Operating system: User interface: Platform indepen- 
dent; Server side: Linux 



Table 1 : Outline of features for GeneMesh and other web tools with MeSH analysis capabilities. 

Feature GeneMesh Gendoo MeSHer HuGE 

Navigator 



Map gene(s) to MeSH categories" 


Yes 


Yes 


Yes 


Yes 


Browse MeSH relationships' 1 


Yes 


Yes 


Yes 


Yes 


Microarray platform independence" 1 


Yes 


Yes 




Yes 


Capable of analyzing a group of genes d 


Yes 


Yes 


Yes 




Significance analysis of MeSH categories 5 


Yes 


Yes 


Yes 




Significance analysis threshold is adjustable 1 " 


Yes 




Yes 




Expression data uploads 


Yes 








Graphical display of expression data (e.g., heatmaps, tables) 


Yes 








MeSH output can be viewed by functional attributes (e.g., KEGG 


Yes 









pathway, Gene Ontology, NCBI interaction)'' 

Web 2.0 design Yes 
a Software identifies links between MeSH categories and genes 

b Software offers a browser feature to navigate through gene associations and/or the MeSH framework 

c Software analyzes based on gene name, symbol or ID as opposed to a sequence identifier associated with a specific microarray platform 
d Software can analyze a group of input genes as opposed to a single gene 

e Software performs a statistical analysis to identify significant MeSH categories for the input gene set 
'Threshold for significance analysis can be adjusted to govern stringency 
a Software will upload expression data together with the gene set 

h Output is presented to facilitate browsing of other functional attributes, including KEGG pathway, Gene Ontology, and NCBI interactions 
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Figure 2 Diagram of the flow of input and output information us- 
ing the GeneMesh program. After registration a user can upload a 
microarray dataset in the .CSV format and select a variety of action op- 
erations (boxed in the flowchart). Registered users may also enter a list 
of GenelDs to obtain MeSH terms associated with genes in the list or 
they may enter a word or phrase and obtain associated MeSH terms 
and genes. Dotted lines in the flowchart connect operations that can 
be performed without uploading a data file. The various types of infor- 
mation outputs are shaded in grey. 
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Programming language: Perl, R, PHP, HTML and 
j Query 

Other requirements: Web browser (supporting 
JavaScript) 

Any restrictions to use by non-academics: License 
needed 

For reviewers of the software: To enable reviewers to 
test the GeneMesh software in a way that preserves their 
anonymity we have created a demo account that can be 
accessed at http://cbrc.musc.edu/homepage/jani/gen- 
emesh/index.html by entering the username 
'demo@demo.com' and the password 'testgenemesh. 



14. Gendoo [http://aendoo.dbcls.jp/l 

15. MeSHer [http://compbio.dfci. harvard.edu/mesher.htmll 

16. HuGE Navigator [http://huaenavigator.net/l 
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Additional material 



Additional file 1 Supplementary information An example of a Gen- 
eMesh analysis conducted on a list of differentially expressed genes. 
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